Motivation:¶
Ever since AlexNet reached the top on ImageNet, neural networks at their different forms have been widely utilized for most of the main computer vision tasks. Nowadays, state-of-the-art deep neural networks are widespread performing tasks with high stakes such as performing hand-written character recognition, performing crevasse "object" detection in public infrastructure such as bridges or buildings, or unlocking your iPhone at 3:00 am when you try to check your bank account after partying.
As a matter of fact, it stood out to me that such architectures are being used in target recognition in military operations. I would imagine that this would involve a pipeline likely intertwining object detection, image segmentation and image classification. In such a world, I don't blame anyone that thinks that we would be doomed to failure if we were to fight back such technologies. Hence, let us imagine the following scenario. After Trump's reelection in 2028 (8 year limits are for normal people), he has decided to give all the U.S. budget to Mark Zuckerberg. Unbeknownst of its black-box reasonings, Mark decides to give his new AI model, HAL 9000, full power to control DODGE. Suddenly, HAL goes on a strike and easily takes over most part of the world in the name of efficiency. Now, HAL will not stop until each one of us has surrendered to its power:
What can we do?
Well, maybe we can try joining it using a neuralink brain chip, but curiously enough HAL is not compatible with products from Elon. Well, then that is not quite it... The situation is getting desperate, a bunch of drones is surrounding you (the main rebel) and no one knows what to do. However, suddenly you remember the tutorials that you read on the Dark Arts of Computer vision and start preparing your comeback. Your main aim:
To slightly transform the input image of you that the drone receives to fool it into thinking that it is not you.
That is, we aim to be able to perform attacks in which we confuse models like ResNet50 to misclassify images with confidence. The following are two examples:
ResNet50: Ummm let me guess... A dutch oven - with 93.63% confidence
ResNet50: I am 97.37% confident that this is a garbage truck...
Throughout this article we will cover the following topics:
- Background: The Fast Gradient Sign Method (FGSM)
- 1.1 The mathematical intuition around how DL classifiers segment images in soft boundaries
- 1.2 The role of noise in the FGSM attacks
- 1.3 How do we choose this noise?
- 1.4 FGSM Attack: Examples
- 1.5 Iterative FGSM: A necessary adaptation
- The Adapted Targeted FGSM Attack
- 2.1 Motivation for a change
- 2.2 Noise selection in the adapted FGSM
- 2.3 Adapted Targeted FGSM Attack: Code & Examples
- 2.4 Masquerading Adapted Targeted FGSM Attack with a penalty term
- 2.5 Example Attacks on State-of-the-Art Models
- Discussion & Conclusion on FGSM-based attacks
- Bibliography
Imports¶
The README specifies how access to the code can be requested. Here, we will only illustrate the results of these algorithms applied using custom implementations
%load_ext autoreload
%autoreload 2
from pathlib import Path
import torch
from torch.utils.data import DataLoader
from torchvision.models import resnet50, ResNet50_Weights, ViT_B_16_Weights, vit_b_16
import matplotlib.pyplot as plt
from vision_attack.dataset.image_transforms import ImageDataSet
from vision_attack.fast_gradient_sign_method.fgsm import (
apply_fast_gradient_sign_method,
apply_multi_step_fast_gradient_sign_method,
apply_multi_step_targeted_fast_gradient_sign_method,
apply_targeted_fast_gradient_sign_method,
)
from vision_attack.utils.plot_utils import (
plot_attack_results,
plot_image_pair_with_labels,
plot_attack_batch_results,
plot_attack_comparison,
)
1. Background: The Fast Gradient Sign Method (FGSM)¶
1.1 The mathematical intuition around how DL classifiers segment images in soft boundaries¶
Neural networks learning to classify images are, in short, fitting a distribution over a high dimensional space. That is, assume that:
$$ X \in IM = M_{3 \times H \times W}(\{v \in \mathbb{N}: 0\leq v \leq 255\}); \; Y \in CAT = \{ c \in \mathbb{N}: 0\leq v \leq C\} $$
where $IM$ are the RGB images being where each pixel is represented by three bytes, and $CAT$ is the set of categories for which the images could be classified ($C+1$ categories). Then images from categories are sampled from a joint distribution over the random vectors $(X, Y)$, and we aim to fit $p(Y| X)$ (actually, fitting by reducing the cross-entropy loss is equivalent to doing maximum likelihood estimation with the parameters of a neural network).
However, the main point is that fitting with such architectures of intertwined linear and non-linear operations, makes soft boundaries in $IM$ which delimit the categories of $CAT$. Moreover, the boundaries are mainly set by learning the patterns of lower-dimensional spaces corresponding to images from categories in the high-dimensional space of $IM$. This leads to many unexpected behaviors of neural networks like the one we will see now: we can modify an image slightly so that even after a human sees it as coming from its original category the model confuses it with other category.
For instance, if $f_{\theta}$ is a neural network of the form:
$$\vec{f_{\theta}(X)} = Softmax(B_2 + ReLU(W\times X + B_1)) $$
Then, we have that the set of images classified as category $c$ is:
$$CAT_c = \{X \in IM | \exp(B_{2c} + ReLU(W\times X + B_1)_c) \geq \exp(B_{2k} + ReLU(W\times X + B_1))_k; \forall k \}$$
which we could aim to titanically solve and proceed to get the Fields Medal. Nevertheless, this may suffice to illustrate that these soft partitions of the image spaces allow for some room to attack models.
1.2 The role of noise in the FGSM attacks¶
The FGSM was first described by Goodfellow et al.. In short, they found that a small perturbation in an input image even below the float precision used to represent pixels in images could have big impacts for trained neural networks since their linearities could escalate the perturbation. This basically means that an image containing an unnoticeable but reverberating perturbation could be tailored to mess with a neural network.
In short, assume that $X\in IM$ is an image, $\mathbb{E} \in IM$ is some noise of choice such that:
$$||\mathbb{E}||_\infty \leq \text{precision} = \alpha$$
and that $f_\theta$ is the neural network for image classification with first weight matrix $W$ and bias $B$. Then we could have that $X_{\mathbb{E}} = X + \mathbb{E}$ is heavily misclassified since the noise is echoed in the first and subsequent linear operations. That is:
$$W\times X_\mathbb{E} + B = W\times X + W \times \mathbb{E} + B$$
so that the pre-activation neurons would be added some potentially disturbing noise: $$W \times \mathbb{E}$$
Nevertheless... How do we choose $\mathbb{E}$ ? Well, its quite simple: just remember backpropagation is your friend.
1.3 How do we choose this noise?¶
Neural networks - let them be of any type - are built by composing functions that are differentiable with respect to their arguments. Interestingly enough, this does not restrict us to differentiate with respect to the parameters of the network i.e. we can also differentiate with respect to the input.
To put it more mathematically, if we have a neural network $f$, some $(X, Y) \sim P$, and some $f$ was trained with the loss function $L$, then we can compute: $$\frac{\delta L(f(X), Y)}{\delta X}$$
From $X$, this gradient is pointing towards the direction which increases the loss value. Actually, if you stand in $X$ and look towards $\frac{\delta L(f(X), Y)}{\delta X}$ you will see a sign saying:
The error direction 👻
However, quoting Mr. White from Breaking Bad:
You clearly don’t know who you’re talking to, so let me clue you in. I am not in danger, Skyler. I am the danger.
Equivalently, we could say that since this direction is the most rapid one to make the neural network misclassify the input image, we can modify our image by taking a calculated step towards it. Now it is only left to decide the length of this step. We could aim to take a per-pixel step with the same length as the precision $\alpha$, by simply getting the vector of element-wise signs of the computed gradient, and multiplying it by $\alpha$:
$$\alpha \cdot \text{sign} \bigg(\frac{\delta L(f(X), Y)}{\delta X}\bigg)$$
That is, we would modify our input with this noise to get the following altered image:
$$X + \alpha \cdot \text{sign} \bigg(\frac{\delta L(f(X), Y)}{\delta X}\bigg)$$
The idea is to choose $\alpha$ so that each pixel is changed by the smallest amount the sensor still detects. Thus, the resulting image is as dangerous as possible since, not only do we have that it is as close as possible to the original according to the sensor, but it is also increasing as much as possible the misclassification potential.
Importantly, this attack also works for larger $\alpha$ values; however, larger perturbations may be more noticeable.
First, let us initiate a dataset, and two computer vision DL models to confuse:
dataset = ImageDataSet(
Path().cwd().parent.parent / "data/image_annotations.txt",
Path().cwd().parent.parent / "data/images/",
)
dataloader = DataLoader(dataset)
resnet50_model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V1).eval()
In order to perform these attacks, we need to know both the model as well as the classes (the integer and the meaning) that its training image classification dataset had. For this tutorial, we will just focus on these two models and the dataset of ImageNet1K. Note that the images under attack can clearly not belong to the dataset (otherwise it would not be worth it).
Now, we get two images to illustrate this attack.
batches = [batch for batch in dataloader]
img1, label1 = batches[0]
img2, label2 = batches[1]
pil1 = dataset.inverse_transform(img1)
pil2 = dataset.inverse_transform(img2)
plot_image_pair_with_labels(pil1, pil2, label1, label2)
Next, the FGSM method is used to modify these images:
transformed_tarantula = apply_fast_gradient_sign_method(
batch=batches[0],
attack_subject=resnet50_model,
adversarial_epsilon=0.02,
)
prediction_transformed_tarantula = resnet50_model(transformed_tarantula).softmax(dim=-1)
predicted_label_transformed_tarantula = prediction_transformed_tarantula.argmax()
confidence_transformed_tarantula = prediction_transformed_tarantula.max()
transformed_scorpion = apply_fast_gradient_sign_method(
batch=batches[1],
attack_subject=resnet50_model,
adversarial_epsilon=0.02,
)
prediction_transformed_scorpion = resnet50_model(transformed_scorpion).softmax(dim=-1)
predicted_label_transformed_scorpion = prediction_transformed_scorpion.argmax()
confidence_transformed_scorpion = prediction_transformed_scorpion.max()
pil_tar = dataset.inverse_transform(transformed_tarantula)
pil_scorp = dataset.inverse_transform(transformed_scorpion)
plot_attack_results(
pil_tar,
pil_scorp,
predicted_label_transformed_tarantula,
predicted_label_transformed_scorpion,
confidence_transformed_tarantula,
confidence_transformed_scorpion,
)
What you see is the reality of FGSM Attacks - since they aim to modify imperceptibly the image in one step, they are quite unlikely to give results at once. It would be like attempting to learn the parameters of a neural network by making a simple step towards the direction of the gradient descent. However:
DO NOT WORRY
You are not wasting your time here. Stay with me. It will be worth the time!
1.5 Iterative FGSM: A necessary adaptation¶
This approach mainly consists on updating the image iteratively with the gradient ascent noise that the modified image of that iteration produces. That is, if we make two iterations we will make first the update:
$$X' = X + \alpha \cdot \text{sign} \bigg(\frac{\delta L(f(X), Y)}{\delta X}\bigg)$$
followed by the update:
$$X'' = X' + \alpha \cdot \text{sign} \bigg(\frac{\delta L(f(X'), Y)}{\delta X}\bigg)$$
As always we might apply several stopping criteria like whether the deception was achieved, whether all epochs were completed, or other. Below you can see what happens when we follow this multi-step approach:
transformed_tarantula_2 = apply_multi_step_fast_gradient_sign_method(
batch=batches[0],
attack_subject=resnet50_model,
max_epochs=20,
)
prediction_transformed_tarantula_2 = resnet50_model(transformed_tarantula_2).softmax(
dim=-1
)
predicted_label_transformed_tarantula_2 = prediction_transformed_tarantula_2.argmax()
confidence_transformed_tarantula_2 = prediction_transformed_tarantula_2.max()
transformed_scorpion_2 = apply_multi_step_fast_gradient_sign_method(
batch=batches[1],
attack_subject=resnet50_model,
max_epochs=20,
)
prediction_transformed_scorpion_2 = resnet50_model(transformed_scorpion_2).softmax(
dim=-1
)
predicted_label_transformed_scorpion_2 = prediction_transformed_scorpion_2.argmax()
confidence_transformed_scorpion_2 = prediction_transformed_scorpion_2.max()
5%|▌ | 1/20 [00:00<00:00, 20.05it/s] 5%|▌ | 1/20 [00:00<00:00, 28.36it/s]
pil_tar2 = dataset.inverse_transform(transformed_tarantula_2)
pil_scorp2 = dataset.inverse_transform(transformed_scorpion_2)
plot_attack_results(
pil_tar2,
pil_scorp2,
predicted_label_transformed_tarantula_2,
predicted_label_transformed_scorpion_2,
confidence_transformed_tarantula_2,
confidence_transformed_scorpion_2,
)
As we see now, we are able to fool the ResNet50 model on both images by modifying them almost inadvertently. For instance, the tarantula would be misclassified as a barn spider (yeah, I do not really know what are these), whereas the scorpion would be misclassified as a fiddler crab (you can check the different ImageNet classes here).
2.0 The Adapted Targeted FGSM Attack¶
2.1 Motivation for a change¶
Let us imagine that magically we are able to modify the image that the drone reads when it is about to pull the trigger. The following happens:
- Our side
- We have meticulously followed the steps to perform an FGSM attack, and confidently face back to our companions to brag about how good our DL knowledge is.
- The drone's side
- The image is passed through the Deep Learning model and the image is misclassified.
- Our side
- We are super happy and make a toast to freedom.
- The drone's brain
- I see, so we are talking about an enemy tank. I should better use the big weaponry.
- Our side
- Suddenly, we start getting shoot with a bazooka.
Indeed, we are forcing the model to make misclassifications, but we are unfortunately not controlling the direction towards which we are producing the misclassification. Here, the image was misclassified as enemy tank instead of "human enemy", which was not to our best interest.
What can we do?
Intuitively, what we need to do is to control the noise's direction so that we can produce controlled misclassification. That is, we will not look at the direction that produces the largest error given the true label, but rather we will face towards the direction in which we minimize the loss between the model's prediction and a fake label that we want to induce to our image i.e. the noise will produce the model to misclassify the image as that fake target.
2.2 Noise selection in the adapted FGSM¶
This new approach draws a new use of noise to make more intentional misclassifications. Now, for a sampled (image, label) pair, $(X, Y) \sim P$, we first choose a different target class $\tilde{Y}$ so that we will add noise to $X$ in order to make a model $f$ misclassify it as $\tilde{Y}$.
How may we do this?
Well, again we will make use of the backpropagation algorithm to aggregate noise strategically. In this case, it would consist of the following update at each iteration of the algorithm:
$$X' = X - \alpha \cdot \text{sign} \bigg(\frac{\delta L(f(X), \tilde{Y})}{\delta X}\bigg)$$
Two main differences can be noticed here with respect to the previous methods:
- Now, the loss is comparing a distance between the prediction of the model on a transformed image and the fake target.
- Instead of going towards the direction of the gradient (i.e. gradient ascent), now we are going in the reverse direction since we want to minimize the loss.
Notice that we will not bother on restricting us to one step since we can control the amount of iterations with reasonable stopping criterions such as fooling rates.
transformed_tarantula_3 = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[0][0],
attack_subject=resnet50_model,
adversarially_induced_target=158,
max_epochs=20,
use_difference_penalty=False, # ignore this argument
)
prediction_transformed_tarantula_3 = resnet50_model(transformed_tarantula_3).softmax(
dim=-1
)
predicted_label_transformed_tarantula_3 = prediction_transformed_tarantula_3.argmax()
confidence_transformed_tarantula_3 = prediction_transformed_tarantula_3.max()
transformed_scorpion_3 = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[1][0],
attack_subject=resnet50_model,
adversarially_induced_target=417,
max_epochs=20,
use_difference_penalty=False, # ignore this argument
)
prediction_transformed_scorpion_3 = resnet50_model(transformed_scorpion_3).softmax(
dim=-1
)
predicted_label_transformed_scorpion_3 = prediction_transformed_scorpion_3.argmax()
confidence_transformed_scorpion_3 = prediction_transformed_scorpion_3.max()
0%| | 0/20 [00:00<?, ?it/s]
95%|█████████▌| 19/20 [00:00<00:00, 33.77it/s] 85%|████████▌ | 17/20 [00:00<00:00, 44.99it/s]
pil_tar3 = dataset.inverse_transform(transformed_tarantula_3)
pil_scorp3 = dataset.inverse_transform(transformed_scorpion_3)
plot_attack_results(
pil_tar3,
pil_scorp3,
predicted_label_transformed_tarantula_3,
predicted_label_transformed_scorpion_3,
confidence_transformed_tarantula_3,
confidence_transformed_scorpion_3,
)
Now, the model classified the tarantula and scorpion as a toy terrier and a balloon as we intended. GREAT! We are able to fool models into producing desired erroneous outcomes.
Note that even though the model is now predicting the induced class for our modified input images, it is doing it with a low confidence. In this case, it is because a stopping technique is being used so that as soon as the model starts fooling the majority of the batch, the algorithm halts. This criterion then results in images that fool the model by the bare minimum. As a result, models with additions like conformal adaptive prediction sets could stop such attacks by predicting a set of possible categories in which the true category may lie.
Consequently, we would have to modify the stopping criterion to have the images in the batch satisfy that not only the majority fool the model, but that also they do it with a high confidence (this threshold would have to be conveniently chosen to fool adjoint procedures like conformal predictions). The following examples are the result of simply changing the termination criterion to this new version.
transformed_tarantula_4 = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[0][0],
attack_subject=resnet50_model,
adversarially_induced_target=158,
max_epochs=100,
criterion="confident_fooling_rate",
use_difference_penalty=False,
)
prediction_transformed_tarantula_4 = resnet50_model(transformed_tarantula_4).softmax(
dim=-1
)
predicted_label_transformed_tarantula_4 = prediction_transformed_tarantula_4.argmax()
confidence_transformed_tarantula_4 = prediction_transformed_tarantula_4.max()
transformed_scorpion_4 = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[1][0],
attack_subject=resnet50_model,
adversarially_induced_target=417,
max_epochs=100,
criterion="confident_fooling_rate",
use_difference_penalty=False,
)
prediction_transformed_scorpion_4 = resnet50_model(transformed_scorpion_4).softmax(
dim=-1
)
predicted_label_transformed_scorpion_4 = prediction_transformed_scorpion_4.argmax()
confidence_transformed_scorpion_4 = prediction_transformed_scorpion_4.max()
0%| | 0/100 [00:00<?, ?it/s]
44%|████▍ | 44/100 [00:01<00:01, 41.84it/s] 24%|██▍ | 24/100 [00:00<00:01, 47.20it/s]
pil_tar4 = dataset.inverse_transform(transformed_tarantula_4)
pil_scorp4 = dataset.inverse_transform(transformed_scorpion_4)
plot_attack_results(
pil_tar4,
pil_scorp4,
predicted_label_transformed_tarantula_4,
predicted_label_transformed_scorpion_4,
confidence_transformed_tarantula_4,
confidence_transformed_scorpion_4,
)
2.4 Masquerading Adapted Targeted FGSM Attack with a penalty term¶
When comparing the last two tarantula images, we realize that the new termination criterion comes with the drawback that there are more modifications being applied to images since the criterion is harder to achieve. This would motivate the introduction of a simple penalty to palliate this issue by updating the input image using the following:
$$X^{n+1} = X^n - \alpha \cdot \text{sign} \bigg(\frac{\delta L_\theta(X^n, \tilde{Y})}{\delta X^n}\bigg)$$
where the new loss function, $L_\theta$, tries to ensure that attack images are not too different from their originals:
$$ L_\theta(X^n, \tilde{Y}) = L(f(X^n), \tilde{Y})\cdot \bigg ( 1 + \frac{\beta}{\text{height}\cdot\text{width}}\sum_{i, j}||X^n_{i, j} - X^0_{i, j}||_1\bigg)$$
This is what I came up with, but surely there are better ways to enforce the same condition. Additionally, proper experimentation should be carried to find the optimal beta. However, since developing this additional penalty is not in our main focus, I just did some quick trials and found that setting it to $100$ gives solid results as you can see next.
Here, we just perform the attack with the new loss:
adv_examp = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[0][0],
attack_subject=resnet50_model,
adversarially_induced_target=873, # triumphal arch
adversarial_epsilon=0.002,
max_epochs=100,
criterion="confident_fooling_rate",
)
adv_examp_difference_penalty = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[0][0],
attack_subject=resnet50_model,
adversarially_induced_target=873, # triumphal arch
adversarial_epsilon=0.002,
max_epochs=100,
criterion="confident_fooling_rate",
use_difference_penalty=True,
)
0%| | 0/100 [00:00<?, ?it/s]
46%|████▌ | 46/100 [00:01<00:01, 38.11it/s] 26%|██▌ | 26/100 [00:00<00:01, 44.52it/s]
Indeed the new loss produces an attack image which is much closer to the original image:
print(f"Without the penalty term the mean L1-distance between the images is: {torch.nn.functional.l1_loss(
adv_examp.cpu(),
batches[0][0],
reduction='mean',
).item()}")
print(f"With the penalty term the mean L1-distance between the images is: {torch.nn.functional.l1_loss(
adv_examp_difference_penalty.cpu(),
batches[0][0],
reduction='mean',
).item()}")
Without the penalty term the mean L1-distance between the images is: 0.07049316167831421 With the penalty term the mean L1-distance between the images is: 0.028054311871528625
However, the success of this new penalty term can be seen by inspecting the resulting attack images:
plot_attack_comparison(
original_image=batches[0][0],
adversarial_images=[adv_examp, adv_examp_difference_penalty],
attack_names=["Without penalty", "With penalty"],
model=resnet50_model,
dataset=dataset,
)
Therefore, in the next section we will default to using this penalty to boost our attacks:
We will now proceed to brag as much as possible of our new attack tool by performing several attacks on state-of-the-art image classification deep learning models trained and tested in the ImageNet classification dataset. Hence, we first see which models we can attack by looking at torchvision's catalog..
We select the following since at first glance they seem the best ones: ResNet152, RegNet_Y_16GF, ViT_B_32. What a bunch of classification titans!
from torchvision.models import (
regnet_y_16gf,
resnet152,
vit_b_32,
RegNet_Y_16GF_Weights,
ResNet152_Weights,
ViT_B_32_Weights,
)
We set a new dataloader to do the transformations with higher batch sizes:
new_dataloader = DataLoader(dataset, batch_size=7, shuffle=False)
batches = [batch for batch in new_dataloader]
We can start by proving that this approach works for RegNet_Y_16GF:
regnet = regnet_y_16gf(weights=RegNet_Y_16GF_Weights.IMAGENET1K_V1)
adv_examp_regnet = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[0][0],
attack_subject=regnet,
adversarially_induced_target=734, # police van
adversarial_epsilon=0.002,
max_epochs=100,
criterion="confident_fooling_rate",
use_difference_penalty=True,
)
plot_attack_batch_results(adv_examp_regnet, dataset=dataset, model=regnet)
42%|████▏ | 42/100 [00:10<00:14, 3.95it/s]
Indeed we are able to fool RegNet_Y_16GF to confidently believe that the above images contain police vans. Next, we see that something similar can be done for ResNet152:
resnet152_ = resnet152(weights=ResNet152_Weights.IMAGENET1K_V1)
adv_examp_resnet152 = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[0][0],
attack_subject=resnet152_,
adversarially_induced_target=861, # toilet seat
adversarial_epsilon=0.002,
max_epochs=100,
criterion="confident_fooling_rate",
use_difference_penalty=True,
)
plot_attack_batch_results(adv_examp_resnet152, dataset=dataset, model=resnet152_)
29%|██▉ | 29/100 [00:04<00:10, 6.64it/s]
which has been shown to believe that the above are instances of toilet seats. Now, why don't we make ViT_B_32 believe the input image contains only guacamole:
vitb32 = vit_b_32(weights=ViT_B_32_Weights.IMAGENET1K_V1)
adv_examp_vitb32 = apply_multi_step_targeted_fast_gradient_sign_method(
image_batch=batches[0][0],
attack_subject=vitb32,
adversarially_induced_target=924, # guacamole
adversarial_epsilon=0.002,
max_epochs=200,
criterion="confident_fooling_rate",
use_difference_penalty=True,
)
plot_attack_batch_results(adv_examp_vitb32, dataset=dataset, model=vitb32)
44%|████▍ | 88/200 [00:06<00:07, 14.11it/s]
Hence, we are able to fool state-of-the-art deep learning models.
3. Discussion & Conclusion on FGSM-based attacks¶
Under the assumption that we were able to modify the input image that the drone has of us, with our new toolset none of the fictitious scenarios that I have mentioned above would pose a problem since now, we can selectively add noise to images to alter the classification of Deep Learning models. In other words, we have earned bragging rights for the rest of the apocalypses that I have described.
Nevertheless, it is important to notice the following tradeoff when altering the classification of a DL vision model to an induced target:
The more confident we want the model to be about its erroneous outcome, the bigger the noise that we need to change the input image
As a matter of fact, I would hypothesize that this would highly depend on how different is the true class with respect to the induced target class. If we wanted to confidently misclassify a banana image as a saxophone the added noise would likely be smaller than that needed to confidently misclassify it as a dog. Nevertheless, validating this hypothesis is not part of this article: we want to stay flashy.
The main drawback of FGSM-based approaches is that, usually, we cannot exactly add the adversarial noise we computed. That is, how can you alter the input image that a drone receives so that each pixel is added the necessary noise to make these attacks successful? Well, usually you would not have that much control over an environment to make this attack successful. Indeed, you cannot change the landscape around you and your looks so that the final input image that the drone receives matches the image you have computed with the adapted FGSM attack.
Does this mean that the adapted FGSM attack is useless? Certainly not. It only means that to perform the attack you need to have complete control over the image. For instance, we can imagine a fictitious usecase in which an attacker may use these methods:
- An employee wants to have a leave on the basis of illness.
- His private health insurance offers a solution that with only an image of his tonsils determines whether the patient has tonsillitis or other illnesses.
- This platform then provides, based on the results, prescription drugs and absence justifications for employers.
- The employee searches on the internet and sees that the model that they use is actually only a classification model from torchvision which has been trained with a public dataset that he can access.
As the attacker here has total control of the image it is easy for him to ensure that the model receives exactly what the adapted FGSM attack output.
Is it still not sufficiently cool? If that is the case you can check my website. There I will keep posting attack methods which only get cooler and better! For the next article, I will explain an attack method that works even when the attacker does not have complete control over the input image. As a matter of fact, the attacker will just need to have a printer in order to perform these attacks!
Thanks for reading this article! Hope you have enjoyed it!
4. Bibliography¶
In the following links you can read the original papers describing the FGSM attack and the iterative method: